Methods for scalable data assimilation

Colin Grudzien

Assistant Professor of Statistics

University of Nevada, Reno logo.

Presenting work in collaboration with

Marc Bocquet1, Alberto Carrassi2, Chris KRT Jones 3 et al.

  1. CEREA, A joint laboratory École des Ponts Paris Tech and EDF R&D, Université Paris-Est, Champs-sur-Marne, France.
  2. University of Reading, Department of Meteorology and NCEO, Reading, United Kingdom.
  3. University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.

Introduction

  • I have been an Assistant Professor of Statistics at the University of Nevada, Reno since January 2019;

    • remarkably, over half this time has been spent facing a global pandemic.
  • Despite the challenges, I have pushed forward an innovative research and teaching program by utilizing my core strengths:

    1. my deep training in fundamental mathematics and statistics;
    2. my experience in computing and software development; and
    3. my adaptability and resourcefulness in facing new challenges and mastering new techniques.
  • In addition to my technical experience, I am highly collaborative and I enjoy the camaraderie and sense of purpose in working as part of a tight-knit team.

  • I will share how my experience and accomplishments demonstrate these skills;

    • furthermore, I will discuss the value I will add to the team at CW3E, and how this will cultivate my own professional goals.

General candidate background

  • In 2011, I received my BSc from the University of Oregon, Magna Cum Laude, with majors in Mathematics and History.
    • For my broad accomplishments in the Liberal Arts, I was elected one of the Oregon Six of the Phi Beta Kappa Alpha of Oregon Chapter.
  • I received my PhD in Mathematics from the University of North Carolina at Chapel Hill (UNC-CH) in 2016.
    • My graduate study was primarily funded through the Mathematics and Climate Research Network,
Mathematics and Climate Research Network.
  • with whom I organized a virtual data assimilation seminar with the Nansen Environmental and Remote Sensing Center (NERSC)
Nansen Environmental and Remote Sensing Center logo.
    • and participants throughout the United States.
  • While I was a graduate student, I held a visiting Graduate Research Assistantship at the Los Alamos National Laboratory (LANL),
Los Alamos logo.
  • and I was awarded the UNC-CH Off Campus Dissertation Fellowship in order to complete my dissertation at NERSC.
  • After completing my dissertation, I returned to NERSC as a postdoctoral researcher
  • which included a period as a visiting researcher at CEREA, a joint laboratory École des Ponts Paris Tech and EDF R&D.
CEREA logo.

Research projects and themes

  • My research focus is on scalable data assimilation methodology;

    • I develop theory for ensemble-based estimators in high-dimensional, chaotic dynamical systems characteristic of weather and climate.
  • I utilize dynamical, statistical and numerical tools for understanding:

    • the stability and convergence of these estimators;
    • the robustness and reliability of their inferences in the presence of modeling errors and bias;
    • and the numerical efficiency of these schemes in the forecast cycle, for online predictions and real-time analysis.
  • Originally trained in pure mathematics, I approach my work by leveraging my mathematical training and by continually expanding the depth and breadth of my expertise.

  • My dissertation studied stability analysis of special solutions to PDEs utilizing geometric and dynamical systems tools.1,2

    • Along with my experience at LANL,3 this provides me with a basis in dynamical and numerical modeling of physical systems.
  • I proceeded to leverage my training in dynamical systems to develop a novel stability analysis of the Kalman filter and the EnKF.

1. Grudzien, C., Bridges, T., & Jones, C. (2016). Geometric phase in the Hopf bundle and the stability of non-linear waves. Physica D: Nonlinear Phenomena, 334, 4-18.
2. Grudzien, C. (2016). The instability of the Hocking–Stewartson pulse and its geometric phase in the Hopf bundle. JCAM, 307, 162-169.
3. Grudzien, C., Deka, D., Chertkov, M., & Backhaus, S. (2018). Structure-and Physics-Preserving Reductions of Power Grid Models. SIAM MMS, 16(4), 1916-1947.

Assimilation in the unstable subspace (AUS)

  • Numerical and empirical results have long demonstrated that the skill of ensemble DA methods in chaotic systems is strongly related to dynamic instabilities.4,5,6

    • Particularly, the uncertainty of ensemble-based estimators is strongly related to the multiplicity, strength and observability of dynamical instabilities.7
  • Trevisan et al. proposed filtering methodology for dimensional reduction called Assimilation in the Unstable Subspace (AUS).8

  • AUS gave an intuitive explanation for results in targeting observations to constrain forecast error growth.

    • AUS also provided justification for the success of the EnKF in full-scale geophysical models with a degenerate sample size relative to the state / observation dimension.
  • However, AUS lacked a mathematical formalism that would allow these results to extend beyond the “perfect” model assumption in DA.

  • In my PhD, my collaborators and I established the fundamental mathematical theory for these dimensional reduction results.

4. Buizza, R., Tribbia, J., Molteni, F., & Palmer, T. (1993). Computation of optimal unstable structures for a numerical weather prediction model. Tellus A, 45(5), 388-407.
5. Toth, Z., & Kalnay, E. (1997). Ensemble forecasting at NCEP and the breeding method. Monthly Weather Review 125.12. 3297-3319.
6. Legras, B., & Vautard, R. (1996). A guide to Liapunov vectors. Proceedings 1995 ECMWF seminar on predictability. Vol. 1.
7. Carrassi, A., Trevisan, A., & Uboldi, F. (2007). Adaptive observations and assimilation in the unstable subspace by breeding on the data-assimilation system. Tellus A, 59(1):101–113.
8. Palatella, L., Carrassi, A., & Trevisan, A. (2013). Lyapunov vectors and assimilation in the unstable subspace: theory and applications. Journal of Physics A: Mathematical and Theoretical, 46(25), 254020.

Assimilation in the unstable subspace (AUS)

  • We showed that, under a weakly nonlinear ensemble forecast error dynamics;
    • and with respect to a perfect model assumption;
  • the (ensemble) Kalman filter covariances collapse to the span of the unstable and neutral Lyapunov vectors.9
    • Furthermore, this is the stable solution for all ensemble initializations under generic model initialization and provided the dynamical instabilities are sufficiently observed.
Eigenvalue profile
  • This provides a direct dimensional reduction of the DA / uncertainty quantification problem in terms of:
    • the optimality of the low-rank covariance; and
    • the low-dimensional observation model, when optimizing information to constrain dynamic error growth.
  • My postdoctoral work extended these results to the case of stochastic modeling errors in the prediction cycle.
9. Gurumoorthy, K. S., Grudzien, C., Apte, A., Carrassi, A., & Jones, C. K. (2017). Rank deficiency of Kalman error covariance matrices in linear time-varying system with deterministic evolution. SIAM Journal on Control and Optimization, 55(2), 741-759.
10. Bocquet, M., Gurumoorthy, K. S., Apte, A., Carrassi, A., Grudzien, C., & Jones, C. K. (2017). Degenerate Kalman filter error covariances and their convergence onto the unstable subspace. SIAM/ASA Journal on Uncertainty Quantification, 5(1), 304-333.

Stochastic model error in the DA cycle

  • In dynamics with stochastic model errors, I derived novel boundedness / stability results for the (ensemble) KF covariance.
    • These results tie estimator performance directly to the design of the observation model in the dynamical instabilities.11
Observation operator comparison
  • A primary issue for low-rank estimators is from unrepresented model errors outside of the ensemble covariance span.
  • I derived a mathematical proof-of-concept for adaptive covariance inflation to correct this systematic underestimation.12
Adaptive inflation
11. Grudzien, C., Carrassi, A., & Bocquet, M. (2018). Asymptotic forecast uncertainty and the unstable subspace in the presence of additive model error. SIAM/ASA Journal on Uncertainty Quantification, 6(4), 1335-1363.
12. Grudzien, C., Carrassi, A., & Bocquet, M. (2018). Chaotic dynamics and the role of covariance inflation for reduced rank Kalman filters with model error. Nonlinear Processes in Geophysics, 25(3), 633-648.

Stochastic model error in the DA cycle

  • Furthermore, I studied the relationship between the combined effects of numerical precision and model uncertainty in stochastic dynamics, and its effect in producing bias in ensemble DA statistics.13
  • To rigorously understand these effects and their influence on filter stability / divergence,
  • I developed a novel second-order Taylor-Stratonovich scheme for the Lorenz-96 model with additive noise, to benchmark simulation statistics versus low-order schemes.
Ensemble statistics
Strong convergence
  • This long-term research program is discussed with other results in our recent book chapter.14
13. Grudzien, C., Bocquet, M., & Carrassi, A. (2020). On the numerical integration of the Lorenz-96 model, with scalar additive noise, for benchmark twin experiments. Geoscientific Model Development, 13(4), 1903-1924.
14. Carrassi, A., et al. (2021). Data assimilation for chaotic dynamics. In Volume IV of Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications. Springer.

Efficient EnVAR for short-range forecasting

  • EnVAR methods form the basis of the state-of-the-art for scalable DA,15 where EnKF-based estimators combine:
    • the accuracy of the iterative solution to the Bayesian MAP formulation;
    • the simplicity of model development and maintenance in ensemble DA;
    • the ensemble analysis of time-dependent errors; and
    • optimization of hyper-parameters with machine learning surrogate models.16
    • However, many designs are not cost-effective in short-range prediction.
  • I developed an outer-loop optimization of the EnVAR short-range DA cycle.17
SIEnKS
  • We combine the EnKS / iterative EnKS into a single-iteration ensemble Kalman smoother (SIEnKS) to focus iterative optimization on hyper-parameters and nonlinear observation operators.
15. Bannister, R. (2017). A review of operational methods of variational and ensemble-variational data assimilation, QJRMS, 143, 607–633.
16. Bocquet, M., Brajard, J., Carrassi, A., and Bertino, L. (2020). Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization, Foundations of Data Science, 2, 55–80.
17. Grudzien, C., Bocquet, M. A fast, single-iteration ensemble Kalman smoother for sequential data assimilation. In Submission to Geoscientific Model Development.

Efficient EnVAR for short-range forecasting

  • Our key result is an efficient multiple data assimilation (MDA) scheme within the EnKS cycle.
    • The filter step in this analysis is used as a boundary condition for the interpolation of the posterior over the lag window.
Ensemble statistics
  • This MDA scheme is demonstrated to be more accurate, stable and cost-effective than EnKF-based 4D-EnVAR schemes in short-range forecasts.

Efficient EnVAR for short-range forecasting

  • The data boundary condition improves the forecast statistics, controlling the accumulated forecast error over lagged states unlike traditional 4D-EnVAR approaches.
Ensemble statistics
  • Our mathematical results are supported by extensive numerical demonstration, with the Julia package DataAssimilationBenchmarks.jl which I developed for my research, with contributions from student researchers.18
18. Grudzien, C., Sandhu, S., Jridi, A. (2021, September 4). cgrudz/DataAssimilationBenchmarks.jl:. Zenodo. In preparation for The Journal of Open Source Software.

Future research and professional goals

  • My future research interests include extending the SIEnKS formalism to realistic short-range prediction settings.

    • Both the AUS dynamical analysis for targeting observations and the SIEnKS in optimizing EnVAR are designed for short-range forecast dynamics as with atmospheric rivers.
  • A growing opportunity in short-range and now-casting is in the use of data-driven prediction systems with deep learning;19

    • a challenge, however, with deep learning approaches is the need for abundant and high-quality data to train models.
    • For atmospheric rivers, a pure data approach may be challenging due to large data gaps.20
  • Rather than replacing dynamical models entirely, growing research indicates a path forward for a hybrid approach:

    • using Bayesian MAP formalism in a combination of DA with dynamical models and deep learning surrogate models; where
    • training the surrogate model is within the EnVAR DA cycle, reconstructing the signal from noisy and sparse data.21
  • The hybrid approach, furthermore, can use dynamical principles for data reconnaissance.

  • This research program runs parallel with a longer-term book project I am developing with my collaborators.22

19. Ravuri, S., Lenc, K., Willson, M. et al. (2021). Skilful precipitation nowcasting using deep generative models of radar. Nature 597, 672–677
20. Zheng, M., Delle Monache, L., et al. (2021). Data gaps within atmospheric rivers over the northeastern Pacific. BAMS, 102(3), E492-E524.
21. Brajard, J., Carrassi, A., Bocquet, M., & Bertino, L. (2020). Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: A case study with the Lorenz 96 model. Journal of Computational Science, 44, 101171.
22. Carrassi, A., Grudzien, C., Bocquet, M., Farchi, A., Raanes, P. Data assimilation for dynamical system and their discovery through machine learning. Accepted Book Proposal in Springer-Nature. Target submission in 2023.

Contribution to CW3E

  • My research covers most aspects of the theoretical DA problem, from:

    • modeling and simulation;
    • estimator diagnostics and validation; and
    • observer and estimator algorithm design.
  • Likewise, my work experience is adjacent to the operational aspects of the problem, including:

    • collaboration in a geophysics laboratory environment; and
    • software development and high-performance, networked computing.
  • Additionally, my teaching skills provide me with

    • extensive practice in presentation and communication;
    • as well equipping me to perform statistical analysis and regression.
  • I believe the skill-base that I bring is complementary with the existing operational strengths of the institute;

    • while I lack direct operational experience currently, I am eager to dive into this aspect of the DA problem and achieve mastery of these skills.
  • Likewise, I am eager to utilize my core strengths to develop novel techniques for hybrid DA-machine learning in short-range prediction cycles.

Thank you for your time and consideration!